Natural Langevin Dynamics for Neural Networks

نویسندگان

  • Gaétan Marceau-Caron
  • Yann Ollivier
چکیده

One way to avoid overfitting in machine learning is to use model parameters distributed according to a Bayesian posterior given the data, rather than the maximum likelihood estimator. Stochastic gradient Langevin dynamics (SGLD) is one algorithm to approximate such Bayesian posteriors for large models and datasets. SGLD is a standard stochastic gradient descent to which is added a controlled amount of noise, specifically scaled so that the parameter converges in law to the posterior distribution [WT11, TTV16]. The posterior predictive distribution can be approximated by an ensemble of samples from the trajectory. Choice of the variance of the noise is known to impact the practical behavior of SGLD: for instance, noise should be smaller for sensitive parameter directions. Theoretically, it has been suggested to use the inverse Fisher information matrix of the model as the variance of the noise, since it is also the variance of the Bayesian posterior [PT13, AKW12, GC11]. But the Fisher matrix is costly to compute for largedimensional models. Here we use the easily computed Fisher matrix approximations for deep neural networks from [MO16, Oll15]. The resulting natural Langevin dynamics combines the advantages of Amari’s natural gradient descent and Fisher-preconditioned Langevin dynamics for large neural networks. Small-scale experiments on MNIST show that Fisher matrix preconditioning brings SGLD close to dropout as a regularizing technique. Consider a supervised learning problem with a dataset D = {(x1, y1), . . . , (xN , yN )} of N input-output pairs, to be modelled by a parametric probabilistic distribution yi ∼ pθ(y|xi) (x = ∅ amounts to unsupervised learning of y). Defining the log-loss lθ(yi|xi) := − ln pθ(yi|xi), the maximum likelihood estimator is the value θ that minimizes E(x,y)∈Dlθ(y|x), where E(x,y)∈D denotes averaging over the dataset. Stochastic gradient descent is often used to tackle this minimization problem for large-scale datasets [BL03, Bot10]. This consists in iterating θ ← θ − η Ê(x,y)∈D ∂θlθ(y|x), (1) MILA, Université de Montréal, Canada CNRS, TAU, Université Paris-Saclay, France

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gyration Radius and Energy Study at Different Temperatures for Acetylcholine Receptor Protein in Gas Phase by Monte Carlo, Molecular and Langevin Dynamics Simulations

The determination of gyration radius is a strong research for configuration of a Macromolecule. Italso reflects molecular compactness shape. In this work, to characterize the behavior of theprotein, we observe quantities such as the radius of gyration and the average energy. We studiedthe changes of these factors as a function of temperature for Acetylcholine receptor protein in gasphase with n...

متن کامل

Energy study at different solvents for potassium Channel Protein by Monte Carlo, Molecular and Langevin Dynamics Simulations

Potassium Channels allow potassium flux and are essential for the generation of electric current acrossexcitable membranes. Potassium Channels are also the targets of various intracellular controlmechanisms; such that the suboptimal regulation of channel function might be related to pathologicalconditions. Realistic studies of ion current in biologic channels present a major challenge for compu...

متن کامل

Pareto Optimization of Two-element Wing Models with Morphing Flap Using Computational Fluid Dynamics, Grouped Method of Data handling Artificial Neural Networks and Genetic Algorithms

A multi-objective optimization (MOO) of two-element wing models with morphing flap by using computational fluid dynamics (CFD) techniques, artificial neural networks (ANN), and non-dominated sorting genetic algorithms (NSGA II), is performed in this paper. At first, the domain is solved numerically in various two-element wing models with morphing flap using CFD techniques and lift (L) and drag ...

متن کامل

Investigation of Monte Carlo, Molecular Dynamic and Langevin dynamic simulation methods for Albumin- Methanol system and Albumin-Water system

Serum Albumin is the most aboundant protein in blood plasma. Its two major roles aremaintaining osmotic pressure and depositing and transporting compounds. In this paper,Albumin-methanol solution simulation is carried out by three techniques including MonteCarlo (MC), Molecular Dynamic (MD) and Langevin Dynamic (LD) simulations. Byinvestigating energy changes by time and temperature (between 27...

متن کامل

Adaptive Leader-Following and Leaderless Consensus of a Class of Nonlinear Systems Using Neural Networks

This paper deals with leader-following and leaderless consensus problems of high-order multi-input/multi-output (MIMO) multi-agent systems with unknown nonlinear dynamics in the presence of uncertain external disturbances. The agents may have different dynamics and communicate together under a directed graph. A distributed adaptive method is designed for both cases. The structures of the contro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017